Statistical Machine Translation of Serbian-English

نویسندگان

Maja Popović

Slobodan Jovičić

Zoran Šarić

چکیده

In this work we present the first results of statistical approach to the machine translation of Serbian language into English and vice versa. The experiments are performed on the Assimil language course, bilingual parallel corpus which consists of about 3k sentences and 20k running words from unrestricted domain. The error rates for the translation of Serbian into English are about 35-45% and for the other direction about 45-55%. The results are comparable with those for the other language pairs having been translated using statistical approach. Reducing Serbian words into stems has decreased error rates for the translation into English for about 8% relative.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...

متن کامل

Augmenting a Small Parallel Text with Morpho-syntactic Language Resources for Serbian-English Statistical Machine Translation

In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A sm...

متن کامل

Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages

The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors and address them. Many of these problems are related to the characteristics of the involved languages and differences between them. This work explores the main obstacles for statistical machine translation systems involving two morphologically rich and under-resourced lan...

متن کامل

Exploring cross-language statistical machine translation for closely related South Slavic languages

This work investigates the use of crosslanguage resources for statistical machine translation (SMT) between English and two closely related South Slavic languages, namely Croatian and Serbian. The goal is to explore the effects of translating from and into one language using an SMT system trained on another. For translation into English, a loss due to cross-translation is about 13% of BLEU and ...

متن کامل

Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian

Massive Open Online Courses have been growing rapidly in size and impact. Yet the language barrier constitutes a major growth impediment in reaching out all people and educating all citizens. A vast majority of educational material is available only in English, and state-of-the-art machine translation systems still have not been tailored for this peculiar genre. In addition, a mere collection o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Statistical Machine Translation of Serbian-English

نویسندگان

چکیده

منابع مشابه

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

Augmenting a Small Parallel Text with Morpho-syntactic Language Resources for Serbian-English Statistical Machine Translation

Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages

Exploring cross-language statistical machine translation for closely related South Slavic languages

Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian

عنوان ژورنال:

اشتراک گذاری